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The National Randomized Field Trial of Success for All: Second-Year Outcomes 



Abstract 

This paper reports the second-year outcomes of the national randomized 
evaluation of Success for All, a comprehensive reading reform model. Analyses focus on 
literacy outcomes for a two-year longitudinal student sample and a combined longitudinal 
and in-mover student sample, both of which were nested within 38 schools. Using a 
cluster randomization design, schools were randomly assigned to implement Success for 
All or control methods. Hierarchical linear model analyses for the longitudinal sample 
revealed statistically significant school-level effects of assignment to Success for All on 
three of the four literacy outcomes measured. These impacts were as large as one quarter 
of a standard deviation — or a learning advantage relative to controls exceeding half of a 
school year — on the Word Attack outcome. Impact estimates for the combined 
longitudinal and in-mover sample were somewhat smaller in magnitude and more 
variable. The results correspond with the Success for All program theory, which targets 
school-level achievement effects through a multi-year sequencing of intensive literacy 



instruction. 
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Success for All is one of the most widely implemented, widely researched, and widely 
critiqued educational interventions in the United States today. More than 1,200 schools, mostly 
high-poverty Title I schools, in 46 states are currently implementing the program with external 
assistance provided by the not-for-profit Success for All Foundation. The intervention is 
purchased as a comprehensive package, which includes materials, training, ongoing professional 
development, and a well-specified “blueprint” for delivering and sustaining the model. Schools 
that elect to adopt Success for All implement a whole-school program for students in grades pre- 
K to live that organizes resources to attempt to ensure that every child will reach the third grade 
on time with adequate basic skills and will continue to build on those skills throughout the later 
elementary grades. 

The program focuses on prevention and early, intensive intervention designed to detect 
and resolve reading problems as early as possible, before they become serious. Students in 
Success for All schools spend most of their day in traditional, age-grouped classes, but are 
regrouped across grades for reading lessons targeted to specific perfonnance levels. Using the 
program’s benchmark assessments, teachers assess each student’s reading performance at eight- 
week intervals and make regrouping changes based on the results. Instead of being placed in 
special classes or retained in grade, most students who need additional help receive one-on-one 
tutoring to get them back on track. 

The use of cooperative learning methods also helps children develop academic skills and 
encourages them to engage in teambuilding activities and other tasks that deal explicitly with the 
development of interpersonal and social skills. In addition, a Success for All school establishes a 
Solutions Team, serving to increase parents’ participation in school generally, to mobilize 
integrated services to help Success for All families and children, and to identify and address 
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particular problems such as irregular attendance, vision correction, or problems at home. 

Finally, each Success for All school designates a full-time Program Facilitator who oversees the 
daily operation of the program, provides assistance where needed, and coordinates the various 
components. These are the main features of Success for All, both as originally conceived in 1987 
and as currently disseminated (see Appendix for more details concerning the Success for All 
program components). 

Of the 33 comprehensive school refonn programs reviewed in a recent meta-analysis by 
Borman, Hewes, Overman, and Brown (2003), Success for All was one of only three that had 
positive and statistically significant achievement effects across a large number of rigorous quasi- 
experimental studies. The evidence reviewed by Borman and colleagues from 46 separate quasi- 
experimental comparison-group evaluations of Success for All and its sister program, Roots and 
Wings, from across the United States revealed an overall achievement effect of one fifth of one 
standard deviation ( d = .20). In addition, quasi-experimental evidence from a study by Borman 
and Hewes (2002) demonstrated that students’ multi-year participation in Success for All was 
associated with notable gains in reading and other school-based outcomes that were sustained 
through the completion of eighth grade, several years after they had left the Success for All 
elementary schools. 

Though compelling in terms of its scope and results, this prior research has several 
important limitations. First, the vast majority of these previous studies of Success for All have 
used a quasi-experimental matched comparison-group design, in which experimental and control 
schools were matched on pretests and other demographic characteristics and then compared each 
year on the subsequent posttests. In recent years, an increasing body of evidence has emerged 
suggesting that such comparison-group studies in social policy (e.g., employment, training, 
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welfare-to-work, education) often produce inaccurate estimates of an intervention’s effects, 
because of unobservable differences between the intervention and comparison groups that 
differentially affect their outcomes (Glazerman, Ley, & Myers, 2002). 

Further, the authors of nearly all previous studies of Success for All have employed 
designs that attempt to match program and control schools, but have specified the student as the 
unit of analysis in statistical comparisons of program and control outcomes. Though this unit-of- 
analysis problem does not necessarily bias the impact estimates, it does underestimate the 
standard errors of the estimates and leads the researcher to over-reject null results. Finally, 
because the early studies by the developers, and by many of the third-party evaluators, have 
involved small numbers of treatment sites, much of the prior work on Success for All may 
correspond with that which Cronbach et al. (1980) termed the “superrealization” stage of 
program development. That is, with the researchers actively involved in assuring that they are 
studying high-quality implementations in a select number of schools, some of these earlier 
evaluations may represent assessments of what Success for All can accomplish at its best. The 
extent to which these results may generalize across broader implementations, though, is of some 
concern. 

In 2000, the Success for All Foundation received a grant from the U.S. Department of 
Education to carry out a three-year study that was intended to address these limitations of the 
prior research base. This ongoing study reported here was designed as a cluster randomized trial 
(CRT), with random assignment of a relatively large sample of high-poverty schools from across 
the United States. A distinguished group of scholars, including C. Kent McGuire, Steven 
Raudenbush, Rebecca Maynard, Jonathan Crane, and Ronald Ferguson, was appointed to serve 
as an oversight committee to ensure that all study procedures were appropriate. After initial 
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difficulties recruiting the sample, 41 elementary schools were recruited from across 1 1 states. 
The design primarily compared baseline kindergarten and first grade students nested within 
schools that were randomized into a grade K-2 Success for All treatment condition to 
kindergarten and first grade students whose schools were randomized into a grade 3-5 Success 
for All treatment condition. Thus, the kindergarten and first grade students within the former 
schools received the Success for All intervention — and served as the treatment cases — and the 
kindergarten and first grade students within the latter schools continued with their current 
reading programs — and served as the controls. 

An analysis of the first-year achievement data for the main kindergarten and first-grade 
sample was carried out by Borman et al. (2005). Using hierarchical linear modeling (HLM) 
techniques, with students nested within schools, Borman and colleagues reported school-level 
treatment effects of assignment to Success for All on four reading measures. They found 
statistically significant positive effects on the Woodcock Word Attack scale, but no effects on 
three other reading measures. The effect size for Word Attack was d = 0.22, which represents 
more than 2 months of additional learning gains. In this article, we track the second-year impact 
estimates for the main sample from this three-year study. 

The Success for All Program Theory 

Beyond the applied and empirical evidence supporting Success for All, the central 
features of the program’s theory of action — its comprehensive approach to school reform and its 
focus on high-quality literacy instruction — can be understood on the basis of two distinct 
conceptual frameworks. A review by Ramey and Ramey (1998) of the major findings from 
rigorous evaluations of early interventions revealed the features associated with those programs 
that exhibited the strongest and most persistent effects on children’s outcomes. The framework, 
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biosocial developmental contexualism, derived from this review predicts that fragmented, weak 
efforts in early intervention are not likely to succeed, whereas intensive, high-quality, 
ecologically pervasive interventions can and do. The Success for All program design 
exemplifies this comprehensive and intensive approach to early intervention in two important 
ways. 

First, rather than a targeted, and potentially fragmented, remedial approach for improving 
the school outcomes of children placed at risk, Success for All emphasizes a school-wide focus 
on refonn and improvement. Though components, such as one-on-one tutoring, are in place to 
target children who need extra help, the program theory emphasizes the school as the critical unit 
of intervention and espouses a comprehensive school-level reform plan to help meet the needs of 
all children attending high-poverty schools. Second, the learning experiences offered through 
Success for All are delivered to students directly, efficiently, and effectively in classrooms that 
are regrouped according to students’ current achievement levels. The intensity of these services 
is reflected by characteristics such as students’ daily exposure to an uninterrupted 90-minute 
reading period and the program’s emphasis on a multi-year approach to literacy instruction and 
learning. 

Beyond the comprehensiveness and intensity of Success for All, the core focus of the 
program is on literacy. The importance of literacy can be understood through research that has 
demonstrated that reading skills provide a critical part of the foundation for children’s overall 
academic success (Whitehurst & Lonigan, 2001). Children who read well read more and, as a 
result, acquire more knowledge in various academic domains (Cunningham & Stanovich, 1998). 
The specific sequencing of literacy instruction across the grades is a defining characteristic of the 
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Success for All instructional program that can be understood through a larger body of empirical 
research and theory on beginning reading. 

The Success for All reading program in kindergarten and first grade emphasizes the 
development of language skills and launches students into reading using phonetically regular 
storybooks and instruction that focuses on phonemic awareness, auditory discrimination, and 
sound blending. The theoretical and practical importance of this approach for the beginning 
reader is supported by the strong consensus among researchers that phonemic awareness is the 
best single predictor of reading ability, not just in the early grades (Ehri & Wilce, 1980; 1985; 
Perfetti, Beck, Bell, & Hughes, 1987) but throughout the school years (Calfee, Lindamood, & 
Lindamood, 1973; Shankweiler et ah, 1995). As this awareness is the major causal factor in 
early reading progress (Adams, 1990), appropriate interventions targeted to develop the skill 
hold considerable promise for helping students develop broader reading skills in both the short 
and long tenn. 

During the second through fifth grade levels, students in Success for All schools use 
school- or district-provided reading materials, either basals or trade books, in a structured set of 
interactive opportunities to read, discuss, and write. The program offered from second through 
fifth grade emphasizes cooperative learning activities built around partner reading; identification 
of characters, settings, and problem solutions in narratives; story summarization; writing; and 
direct instruction in reading comprehension skills. Through these activities, and building on the 
early phonemic awareness developed in grades K-l, students in Success for All schools leam a 
broader set of literacy skills emphasizing comprehension and writing. 



Implications and Hypotheses 
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The evidence and theory supporting Success for All suggest several important 
implications for the current study. First, Success for All is best understood as a comprehensive 
school-level intervention. Accordingly, we designed the study as a cluster randomized trial, with 
4 1 schools randomized to a treatment or control condition, and we specified school-level 
analyses of the treatment effects of Success for All within a multilevel framework nesting 
students within the school-level clusters. Second, given the importance of program intensity and 
Success for All’s multi-year approach to literacy instruction, we hypothesized that the program 
effects estimated for the longitudinal sample of students who had experienced the full program 
across two years would be larger in magnitude than those effects found for the sample of all 
students, which included both the longitudinal sample and the group of students who had moved 
into the schools between the time of the pretest and Y ear 2 posttest. 

Third, consistent with the program theory related to the sequencing of the literacy 
instruction, which focuses on phonemic awareness skills initially and broader reading skills later, 
we hypothesized that the program effects would spread into other tested literacy domains beyond 
the Word Attack subtest. 1 Unlike the first-year impacts, which were restricted to the Word 
Attack subtest, we hypothesized that we would begin to find treatment effects on the Letter and 
Word Identification and Passage Comprehension subtests as well. Again, though, due to the 
importance of program intensity, the multi-year sequencing of literacy instruction, and the 
general importance of learning phonemic awareness skills early, we assumed that the effects 
across the tests of broader reading skills would be most pronounced for students from the two- 
year longitudinal sample. 

1 The decoding of non-words is considered the most appropriate measure of phonological recoding (Hoover & 
Gough, 1990; Siegel, 1993; Wood & Felton, 1994). It provides an indication of the capacity to transfer the auditory 
skill of phonological awareness to the task of decoding print. The degree to which students are able to use their 
developing phonemic awareness is directly assessed using the Word Attack subtest. Woodcock Reading Mastery 
Tests-Revised, which is composed of test items that ask the child to decode nonsense words. 
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Method 

Sample selection 

The total sample of 41 schools was recruited in two phases. The initial pilot efforts 
focused on reducing the cost to schools of implementing Success for All, which would ordinarily 
require schools to spend about $75,000 in the first year, $35,000 in the second year, and $25,000 
in the third year. During the spring and summer of 2001, a one-time payment of $30,000 was 
offered to all schools in exchange for participation in the study. Those schools randomly 
assigned to control could use the incentive however they wished, and were allowed to purchase 
and implement any innovation other than Success for All. The schools randomized into the 
Success for All condition began implementing the program in grades K-5 during the fall of 2001 
and applied the incentive to the first-year costs of the program. During the pilot phase, only six 
schools were attracted by this incentive, with 3 randomly assigned to the experimental condition 
and 3 to the control condition. This sample was far from sufficient. 

A second cohort of 35 schools was recruited to begin implementation in fall of 2002. In 
this cohort, all participating schools received the Success for All program at no cost, but 18 
received it in grades K-2 and 17 in grades 3-5, determined at random. Grades K-2 in the schools 
assigned to the 3-5 condition served as the controls for the schools assigned to the K-2 condition, 
and vice versa. As discussed by Borman et al. (2005), this design, which included both treatment 
and control conditions within each school, had advantages and disadvantages. 

The design proved to provide a sufficient incentive for the successful recruitment of 
schools, and it produced valid counterfactuals for the experimental groups that represented what 
would have happened had the experiment not taken place. The limitation of the design, though, 
was that the instructional program in the treatment grades might influence instruction in the non- 
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treatment grades. Observations of Success for All treatment fidelity, though, have failed to 
document contamination of this kind, but to the extent it may have taken place, it would have 
depressed the magnitude of the treatment impacts. In addition, having the two treatments in the 
same school may have reduced the estimated effectiveness of school-level aspects of the Success 
for All, such as family support, because both control students and treatment students could have 
come forward to take advantage of these services. Though these limitations of the design would 
result in underestimation, rather than overestimation, of the treatment effects, the treatment 
fidelity observations have suggested that materials and instructional procedures in the Success 
for All and non-Success for All grades were distinct from each other and that few if any control 
students benefited directly from school-level Success for All services. 

During both phases of the study, the random assignment was carried out after schools had 
gone through the initial Success for All buy-in and adoption process, which all schools go 
through when applying to implement Success for All. After the schools had hosted an awareness 
presentation by an authorized Success for All program representative and after 80% of the school 
staff had voted affirmatively by secret ballot to move forward with the Success for All program 
adoption, they were eligible for the study. As a final requirement, all schools agreed to allow for 
individual and group testing of their children, to allow observers and interviewers access to the 
school, and to make available (in coded form, to maintain confidentiality) routinely collected 
data on students, such as attendance, disciplinary referrals, special education placements, 
retentions, and so on. The schools were required to agree to allow data collection for three years, 
and to remain in the same treatment condition for all three years of the study. The schools that 
went through this initial process and that agreed to these conditions were randomly assigned by 
the members of the Oversight Committee to experimental or control conditions. 
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After the first year, three schools in St. Louis, which were selected during the second 
phase of recruitment, were closed due to insufficient enrollments. These included one school 
implementing Success for All in grades K-2 and two implementing in grades 3-5. In addition, a 
school in Chicago refused to implement its assigned treatment in grades 3-5, but did allow 
continued assessment. Because this school was implementing the 3-5 treatment rather than the 
K-2 Success for All treatment, it does not have a highly important consequence for the current 
analysis of K-2 treatment effects. In future analyses of grade 3-5 program effects, this school 
will be included as an intent-to-treat case. The loss of the three St. Louis schools reduced the 
second-year analytic sample to 38 schools, 20 implementing Success for All in grades K-2 and 
18 in grades 3-5 (hereafter, K-2 schools will be referred to as “experimental” and 3-5 as 
“control”). 

The experimental and control schools included in the Year 2 analyses of outcomes are 
listed in Table 1. The sample is largely concentrated in the urban Midwest (Chicago, St. Louis, 
and Indianapolis) and the rural and small town South, though there are some exceptions. The 
schools are situated in communities with high poverty concentrations, with just a few rural 
exceptions. Approximately 74% of the students participate in the federal free lunch program, 
which is similar to the 80% free lunch participation rate for the nationwide population of Success 
for All schools. The sample is more African American and less Hispanic than Success for All 
schools nationally. Overall, 57% of the sample is African American, compared to about 40% 
within the typical Success for All school, and 1 1% of the sample is Hispanic, compared to the 
national average of 35%. The percent of white students, 29%, is similar to the Success for All 
percent white of about 25%. 
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INSERT TABLE 1 HERE 



Table 2 compares the baseline characteristics of the experimental and control schools 
included in the analyses of Year 2 outcomes. As the results suggest, the 20 experimental and 18 
control schools were well matched on demographics, and there were no statistically significant 
school-level aggregate pretest differences on the Peabody Picture Vocabulary Test. As 
demonstrated in Borman et al. (2005), the original sample of 2 1 treatment and 20 control schools 
was also well matched, with no statistically significant differences on demographics or pretest 
scores. 



INSERT TABLE 2 HERE 



Treatment Fidelity 

Trainers from the Success for All Foundation have made quarterly implementation visits 
to each school, as is customary in all implementations of the Success for All program. These 
visits established each school’s fidelity to the Success for All model and provided trainers an 
opportunity to work with school staff in setting goals towards improving implementation. Many 
efforts were made to ensure fidelity of the experimental treatment. As is the case in all 
implementations, teachers in Success for All schools received three days of training and then 
about eight days of on-site follow-up during the first implementation year. Success for All 
Foundation trainers visited classrooms, met with groups of teachers, looked at data on children’s 
progress, and gave feedback to school staff on implementation quality and outcomes. These 
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procedures, followed in all Success for All schools, were used in the study schools to attempt to 
obtain a high level of fidelity of implementation. 

As of January 2005, all grade K-2 classes in schools were implementing their assigned 
treatments. There was some variability in implementation quality, which will be the subject of 
future analyses. For instance, several schools took almost one year to understand and implement 
the program at a mechanical level and others embraced the program immediately and have done 
an excellent job. The difficulties in recruiting schools and the last minute recruitment of many of 
them significantly inhibited quality implementation in many schools, as Success for All schools 
would have typically done much planning before school opening that many of the study schools 
(especially in Chicago, St. Louis, and Guilford County, NC) did not have time to do. 

In the non-Success for All grades, teachers were repeatedly reminded to continue using 
their usual materials and approaches, and not to use anything from Success for All. During 
implementation visits, trainers also observed classrooms from control grades. Specifically, these 
observations focused on whether the environment, instruction, and behaviors in the control 
classrooms resembled the characteristics of the Success for All classrooms. In no case did the 
trainers observe teachers in non-Success for All classes implementing Success for All 
components. It is possible that some ideas or procedures from Success for All did influence 
instruction in the non-treatment control grades, but any such influence was apparently subtle. 
Instructional materials and core procedures were clearly distinct from each other in the treatment 
and control grades. 

Measures 

Students in grades K-l were pretested on the Peabody Picture Vocabulary Test and then 
individually posttested on the Woodcock Reading Mastery Tests — Revised (WMTR). The six 
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schools from the first phase of recruitment were pretested in fall 2001 and posttested during the 
spring 2002 and the spring of 2003. The 35 schools from the main sample were pretested in fall 
2002 and posttested in spring 2003 and spring 2004. The pilot and main samples were combined 
for the analyses. Because the metrics of the tests varied, and to aid in interpretation of the impact 
estimates, we standardized the pretest and the posttests to a mean of 0 and standard deviation of 
1 . 

Pretests. All children were individually assessed in fall, 2001 (first phase) or fall, 2002 
(second phase) on the Peabody Picture Vocabulary Test (PPVT III). The few children who were 
Spanish-dominant were pretested in Spanish on the Test de Vocabulario en Imagenes Peabody. 

Posttests. During the spring of 2002 and 2003 (first phase) and the spring of 2003 and 
2004 (second phase) — and during each subsequent spring through 2005 — students in the main 
longitudinal cohorts (which started in K-l) were individually assessed on the four sub tests of the 
Woodcock Reading Mastery Tests — Revised (WMTR): Letter Identification, Word 
Identification, Word Attack, and Passage Comprehension. The WMTR was normed on a 
national sample of children and the internal reliability coefficients for the four subtests used were 
0.84, 0.97, 0.87, and 0.92, respectively. Children in the initial cohorts are being followed into 
any grade as long as they remain in the same school; retention does not change their cohort 
assignment. They are also being followed into special education. Children who entered Success 
for All or control schools after fall, 2002 are also posttested each year and included in analyses 
that combine the baseline cohorts and in-moving student cohorts. Children who are English 
language learners but are taught in English are posttested in English each year. In this analysis, 
we focused on the outcomes for the Year 2 posttests, which were administered to students in the 
pilot schools during spring 2003 and students in the main sample of schools during 2004. 
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Results 

The prior review of baseline data for the school-level sample revealed no important 
differences between treatment and control schools and that the sample of schools was 
geographically diverse and generally representative of the population of Success for All schools. 
In discussing the results of our second-year analyses of achievement outcomes, we begin by 
assessing whether there was differential data and sample attrition between treatment and control 
schools, or systematic attrition from the analytical sample that may have changed its 
characteristics relative to those for the baseline sample. 

The final analytical samples were composed of 1,672 students from the 20 Success for 
All treatment schools and 1,618 students from the 18 control schools. Listwise deletion of 
student cases with missing posttest data did not cause differential attrition rates by program 
condition, % 2 (1, N= 5,736) = 2.02, p = 0.67, leaving 56% of the baseline sample of 2,966 
treatment students and 58% of the 2,770 baseline controls for the preliminary analyses. The data 
and sample attrition occurred for three reasons. Of the students who were excluded from the 
analysis, 1,195 (49%), were dropped because they had moved out of the school before the Year 2 
posttests were administered and, thus, had no outcome data, and 1,021 (42%) remained in the 
treatment and control schools but were missing either pretest, posttest, or other important 
demographic data. Finally, the closure of three participating schools prevented posttesting of 
230 students (9%) in Year 2. 

We compared the pretest scores of those treatment students who were dropped from the 
analyses to the pretest scores of the control students who were dropped from the analyses. No 
statistically significant difference was found between the treatment and the control students, t 
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(0.50),/> = 0.62 (two-tailed), suggesting that the baseline achievement levels of the treatment and 
the control group students who were dropped from the analyses were statistically equivalent. 

To address the issue of external validity, we also compared those students who were 
retained in the analysis to students who were not retained. Those students who were retained had 
higher pretest scores than those who were not retained, t (-6.74), p < .05 (two-tailed). Also, not 
surprisingly, mobile students who had left the Success for All and control schools were 
overrepresented among those with missing data % (1, N= 5,736) = 3457.99, p < .001. Thus, 
both low-achieving and mobile students from the sample schools were underrepresented in the 
analyses, but this does compromise the external validity of the study in two ways. First, because 
past quasi-experimental evidence has consistently shown that Success for All tends to have the 
largest educational effects on students who are struggling academically (Slavin & Madden, 

2001), the omission of low-achieving students with missing posttest data who remained in the 
Success for All schools is most likely to result in downward biases of the treatment effect 
estimates. Second, because the primary missing data mechanism was mobility from the study 
schools, analysis of this longitudinal sample limits generalization to non-mobile students who 
remained in the baseline treatment and control schools. 

While conceding these limitations, there is no conflict in this experiment between random 
assignment of treatment and missing at random. That is, among the complete data observations, 
those assigned to control have similar covariate distributions to those assigned to treatment. As 
noted by Rubin (1976) and Little and Rubin (1987), the missing data process is ignorable if, 
conditional on treatment and fully observed covariates, the data are missing at random (MAR). 

In addition to the analyses of impacts for the longitudinal sample, we conducted a second 
set of analyses of treatment effects for the combined longitudinal and in-moving student samples. 
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These analyses included the longitudinal sample of 3,290 students, who remained at the Success 
for All and control schools from baseline through the second-year posttest, and 890 additional 
students who had moved into the experimental and control schools after the baseline 
assessments. Though the in-moving students did not benefit from the full Success for All 
intervention, this combined longitudinal and in-mover sample does comprise the complete 
enrollments of the targeted grade levels in the treatment and control schools at the time of the 
Y ear 2 posttest. In this way, the sample affords a type of school-level intent-to-treat analysis of 
the program. 

Hierarchical Linear Model Analyses of Year 2 Treatment Effects 

This cluster randomized trial (CRT) involved randomization at the level of the school and 
collection of outcome data at the level of the student. With such a design, estimation of 
treatment effects at the level of the cluster that was randomized is the most appropriate method 
(Donner & Klar, 2000; Murray, 1998; Raudenbush, 1997). We applied Raudenbush’s (1997) 
relatively recently proposed analytical strategy for the analysis of CRTs: the use of a hierarchical 
linear model. In this formulation, we simultaneously accounted for both student and school-level 
sources of variability in the outcomes by specifying a 2 level hierarchical model that estimated 
the school-level effect of random assignment. Our fully specified level 1 , or within-school 
model nested students within schools with an indicator of the student’s baseline grade level (-0.5 
= kindergarten and 0.5 = 1 st grade). The linear model for this level of the analysis was expressed 
as 



Yij = p 0/ + Pi/(GRADE)y + r ip 
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which represents the spring posttest achievement for student i in school j regressed on grade 
level plus the level- 1 residual variance, r (/ -, that remained unexplained after accounting for the 
grade level of the students. 

In this model, each student’s grade level was centered around zero. In this way, we 
controlled for school-to-school differences in the proportions of kindergartners and first graders. 
With grade coded as -0.5 for students in kindergarten at baseline and 0.5 for students in first 
grade at baseline, the level 2 school-specific intercept represented the overall average school 
performance of students from across both grade levels. We treated the within-school grade-level 
gap — the difference between the posttest scores of baseline kindergarten and first grade students 
in school j — as fixed at level 2 because it was intended only as a covariate and we have no 
empirical or theoretical reason to model this source of between-school variability as an outcome. 

At level 2 of the model, we estimated the cluster-level impact of Success for All 
treatment assignment on the mean posttest achievement outcome in school j. As suggested by 
the work of Bloom, Bos, and Lee (1999) and Raudenbush (1997), we included a school-level 
covariate, the school mean PPVT pretest score, to help reduce the unexplained variance in the 
outcome and to improve the power and precision of our treatment effect estimates. The fully 
specified level 2 model was written as 

Po/ = Too + Yoi(MEANPPVT)/ + Yo 2 (SFA) ; - + u 0j , 

Py = Yio 

where the mean posttest intercept for school j, J3q/ was regressed on the school-level mean PPVT 
score, the SFA treatment indicator, plus a residual, uoj. The within-school posttest difference 

2 We formulated other multilevel models that included the broader array of school-level covariates listed in Table 3. 
After including the school mean pretest covariate, though, these more complex models did not explain appreciably 
more between-school variance and did not improve the precision of the Success for All treatment effect estimates. 
For these reasons, we used the more parsimonious models presented. 
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between baseline kindergarten and first grade students, Pi/, was specified as fixed, predicted only 
by an intercept. 

Outcomes for the Longitudinal Sample. The multilevel models, shown in Table 3, 
assessed student and school-level effects on the four literacy outcomes as measured by the 
Woodcock-Johnson Year 2 posttests. Across the four outcomes, the impact estimate for Success 
for All assignment ranged from a standardized effect of approximately d = 0. 12, for Passage 
Comprehension, to d = 0.25 for the Word Attack subtest. Three of the four treatment effects 
were statistically significant, the impact on Word Attack of 0.25 at the p < .01 level of 
confidence, the impact on Letter Identification of 0.18 at the p < .05 level, and the treatment 
effect on Word Identification of 0.16 at the p < .10 level of confidence. In all four models, the 
school-level mean pretest covariate was an important predictor of the outcome and the fixed 
within-school posttest difference between baseline kindergarten and first grade students was 
between nearly half of one standard deviation and more than three quarters of one standard 
deviation. 
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3 The statistical precision of the design can be expressed in terms of a minimum detectable effect, or the smallest 
treatment effect that can be detected with confidence. As Bloom (2005) noted, this parameter, which is a multiple of 
the impact estimator’s standard error, depends on: whether a one- or two-tailed test of statistical significance is used; 
the a level of statistical significance to which the result of the significance test will be compared; the desired 
statistical power, 1 - [3; and the number of degrees of freedom of the test, which equals the number of clusters, J, 
minus 2 (assuming a two-group experimental design and no covariates). 

The minimum detectable effect for our design is calculated for a two-tailed f-test, a level of/? < . 10, power, 

1 - (3, equal to 0.80, and degrees of freedom equal to J = 38 schools minus 3 (a two-group experimental design with 
the school mean PPVT pretest covariate). Referring to Tables 3 and 4 for the Success for All impact estimators’ 
standard errors, which range from .08 to .12, and employing Bloom’s (in press) minimum detectable effect 
multiplier, we calculated minimum detectable effects of approximately d = .20 to d = .30. That is, our design had 
adequate power to detect school-level treatment-control differences of at least .20 to .30 standard deviations. 
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Outcomes for the Combined Longitudinal and In-mover Sample. In Table 4, the 
multilevel models estimate student and school-level effects on the four Year 2 literacy outcomes 
for the combined longitudinal and in-mover sample. The Success for All impact estimates across 
these multilevel models were relatively smaller in magnitude relative than the effects found for 
the longitudinal sample in Table 3. Across the four outcomes, the impact estimate for Success 
for All assignment ranged from a standardized effect of approximately d = 0.09, for Passage 
Comprehension, to d = 0.19 for Word Attack. In addition to the somewhat smaller effects, the 
impacts were estimated with greater uncertainty, as indicated by the larger standard errors for the 
treatment coefficients. The smaller impacts and greater uncertainty of the impact estimates are, 
in part, explained by the greater variability among students in their exposure to the Success for 
All treatment. In this sample, students’ participation in the Success for All treatment ranged 
from two years to several months. 



INSERT TABLE 4 HERE 



Discussion 

The second-year results of the randomized evaluation of Success for All show school- 
level impacts of assignment to the intervention that are extraordinarily consistent with key 
aspects of the program theory and with past meta-analytic evidence on program effects. The 
cluster randomized trial provided strong evidence that the Success for All comprehensive school 
reform strategy is capable of producing school-level effects of both statistical and practical 
importance. The magnitude of these effects across the four literacy outcomes are summarized in 
Table 5 as effect sizes and months of additional learning relative to control schools. When 




Success for All 22 



converted to additional months of learning, the practical effects of the program appear substantial 
for Word Attack and relatively large for the other literacy measures. For Word Attack, the 
longitudinal sample’s learning advantage relative to the controls exceeds half of one nine -month 
school year. 

Consistent with the program theory related to the sequencing of the literacy instruction, 
which focuses on phonemic awareness skills initially and broader reading skills later, we found 
that the program effects began to spread into other tested literacy domains beyond the Word 
Attack subtest. Unlike the first-year impacts, which were restricted to the Word Attack subtest, 
we found statistically significant treatment effects on the Letter and Word Identification sub tests 
as well. 

The reliability and magnitude of these effects, though, are sensitive to the amount of 
exposure students had to the intervention. As the program theory suggests, due to the 
importance of program intensity, the multi-year sequencing of literacy instruction, and the 
general importance of learning phonemic awareness skills early, the effects across the four 
reading skills tested are greater and more reliable for the sample of students from the two-year 
longitudinal sample. Therefore, although Success for All is a comprehensive school-wide 
intervention, which may theoretically advance the academic outcomes of the whole school, 
students seem to need longitudinal exposure to the program in order to make the largest and most 
reliable gains. When one considers literacy achievement as the primary outcome, the multi-year 
sequencing of literacy instruction and the initial skill development in the area of phonemic 
awareness appear to be important mechanisms that drive broader improvements in reading. 



INSERT TABLE 5 HERE 
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Interestingly, the effect sizes reported in Table 5 are quite similar to the overall average 
effect size of d = 0.20 estimated by Borman et al. (2003) in their synthesis of the quasi- 
experimental evaluations of Success for All and Roots and Wings. This result is consistent with 
findings reported by Lipsey and Wilson (1993) and Heinsman and Shadish (1996), who 
concluded that the mean value of the findings of a large number of non-experimental studies 
tends to approximate that of experiments that address the same question. In the context of the 
current study, the meta-analytic summary of the results of 46 quasi-experimental studies of 
Success for All produced an effect size estimate that was essentially the same as the impact 
estimates from this randomized experiment. 

The effects are smaller, though, than those found in the early small-scale matched 
comparison-group studies performed by the developers using the same individually administered 
measures as those used in the present study. Earlier studies such as these, reported by Madden, 
Slavin, Karweit, Dolan, and Wasik (1993) and Slavin and Madden (2001), found effect sizes 
ranging from d = 0.30 to d = 0.50. As Lipsey (2003) noted, the search for a generalization about 
the comparability of the results found in randomized versus nonrandomized designs generally 
features the design type as the moderator variable of primary interest. However, as Lipsey 
found, the program type — that is a research and demonstration project versus a routine practice 
program — is also important for understanding differences in study outcomes. 

Similar to the concept of the super-realization stage of program development articulated 
by Cronbach et al. (1980), the research and demonstration projects typically involve the program 
developers in the design and implementation of a small-scale pilot of an intervention operating at 
its best. As one might expect, given the small scale and direct and sustained involvement of 
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developers in research and demonstration projects, they tend to be more amenable to randomized 
designs than studies of routine practice programs. There are also key differences between the 
impact estimates derived from the two types of studies. Lipsey (2003) demonstrated using meta- 
analytic data concerning the effects of intervention programs to prevent or reduce juvenile 
delinquency, that the research and demonstration programs were associated with larger mean 
effects but that these studies more often employed random assignment designs, which were 
associated with smaller mean effects. Therefore, both the methodological design and the 
program type are important for understanding differences in impact estimates and both have 
implications for interpreting and generalizing studies of educational interventions. 

Conclusion 

The current findings of statistically significant positive achievement effects from a large- 
scale implementation of a randomized field trial of a routine practice program are unusual for 
studies in education. A study of this kind, with 38 schools serving more than 20,000 children in 
districts throughout the United States, provides a rigorous assessment of the impact that can be 
expected when a program is scaled up in a real-world policy context. These effects may be 
interpreted as the impacts likely to be obtained in broad-scale implementations of Success for 
All, with all the attendant problems of start-up and of maintaining quality at scale. The research 
includes schools with good and poor implementations, and with school and district staffs 
potentially less committed to the program than is usual because they were not paying for it. In 
the vast majority of experimental schools, it also included a design that required implementation 
of the instructional program across only three grades rather than across the whole school. In the 
remaining six pilot schools included in the study, the control schools were provided the same 
supplemental funds offered the experimental schools and were permitted to engage in any reform 
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other than Success for All. For these reasons, we believe that the impact estimates are realistic 
but also somewhat conservative. 

Similar recent efforts to study widespread implementations of educational interventions 
through a randomized design have yielded few promising outcomes. For instance, the evaluation 
of 44 schools implementing 21st-Century Community Learning Centers (21st CCLC) programs, 
which help schools and districts partner with community organizations to provide after-school 
programs, revealed no positive impacts on academic outcomes, homework completion, student 
behavior, or parent involvement (James-Burdumy et al., 2005). Similarly, the experimental 
study of 1 8 Even Start family literacy programs revealed no treatment effects on literacy and 
other measures for the children and parents served by the programs (St. Pierre et al., 2003). 
Indeed, aside from the Tennessee Student/Achievement Ration (STAR) class-size reduction 
study (Finn & Achilles, 1999), there are few examples of substantial field trials in education that 
have yielded positive effects of practical and statistical significance across a large number of 
sites. 

Though large-scale cluster randomized trials in field settings are rare in education and 
though findings of widespread treatment effects are rarer still, these outcomes attest that such 
studies are possible and can produce unbiased estimates of program effects. Future reports will 
illuminate many other aspects of the implementation and program impacts. These will include 
studies of the quality of program implementation, effects for subgroups, and effects in the upper 
elementary grades. 
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Table 1 

Schools Participating in Randomized Evaluation of Success for All, Grouped by Assignment. 



School 


District 


ST 


Enrollment 


% 

White 


% African 
American 


% 

Hispanic 


% 

Female 


% 

ESL 


% 

Spec 

Ed 


% 

Free 

Lunch 


Northwood 


Moorseville 


IN 


424 


94.70 


0.00 


0.00 


49.50 


0.00 


4.00 


26.00 


Jefferson 


Midland 


OH 


315 


98.50 


2.00 


0.00 


49.00 


0.00 


16.00 


29.00 


Bertha S. Stemberger 


Guilford 


NC 


381 


62.00 


30.00 


5.00 


50.00 


4.00 


20.00 


30.00 


Pleasant Garden 


Guilford 


NC 


698 


76.00 


13.70 


4.56 


49.50 


3.50 


16.00 


33.00 


Waveland 


S Montgomery 


IN 


152 


99.00 


0.00 


1.00 


46.00 


0.00 


15.00 


34.00 


James Y. Joyner 


Guilford 


NC 


454 


41.80 


42.90 


5.00 


48.50 


0.05 


38.00 


35.00 


Laurel Valley 


Ligonier Valley 


PA 


409 


99.98 


0.01 


0.01 


47.00 


0.00 


9.00 


51.00 


Wood 


Tempe 


AZ 


642 


19.60 


9.60 


40.10 


49.70 


37.50 


11.00 


51.70 


Cesar Chavez 


Norwalk 


CA 


589 


6.00 


3.00 


88.00 


47.00 


39.00 


5.00 


71.00 


Haven 


Savannah 


GA 


373 


0.00 


99.00 


0.00 


65.00 


0.00 


5.00 


85.00 


Brian Piccolo 


Chicago 


1L 


1069 


0.10 


79.50 


20.10 


48.00 


11.40 


11.60 


85.60 


Robert H. Lawrence 


Chicago 


IL 


635 


0.00 


98.60 


0.02 


60.00 


0.00 


11.30 


89.00 


Harriett B. Stowe 


Indianapolis 


IN 


174 


28.71 


25.74 


43.56 


38.00 


40.59 


17.80 


90.59 


Linden 


Linden 


AL 


291 


1.00 


97.00 


1.00 


49.00 


0.00 


15.90 


91.00 


Lafayette 


St. Louis 


MO 


297 


13.20 


72.80 


9.40 


49.70 


25.00 


12.00 


94.00 


Benjamin E. Mays 


Chicago 


IL 


263 


0.00 


95.00 


5.00 


49.00 


0.00 


10.00 


95.00 


Paramount Jr. 


Greene 


AL 


493 


0.00 


100.00 


0.00 


41.00 


0.00 


11.00 


97.00 


Farragut 


St. Louis 


MO 


350 


0.00 


100.00 


0.00 


44.00 


0.00 


4.00 


98.00 


Gundlach 


St. Louis 


MO 


365 


0.00 


100.00 


0.00 


48.00 


0.00 


4.40 


99.00 


Earl Nash 


Noxubee 


MS 


509 


1.00 


98.00 


1.00 


51.00 


0.20 


6.50 


100.00 


Treatment school means 






444 


32.08 


53.34 


11.19 


48.95 


8.06 


12.18 


69.24 


Newby 


Mooresville 


IN 


252 


98.00 


0.00 


0.00 


53.00 


0.00 


10.00 


35.00 


Jamestown 


Guilford 


NC 


516 


41.00 


47.00 


3.40 


45.00 


8.00 


15.00 


39.00 


Central 


Central 


KS 


194 


87.00 


1.00 


5.00 


46.00 


0.00 


10.00 


40.00 


Walnut Cove 


Walnut Cove 


NC 


355 


79.00 


17.00 


1.00 


51.00 


0.01 


26.00 


41.00 


Bluford 


Guilford 


NC 


420 


9.50 


86.70 


1.20 


48.00 


0.00 


13.00 


42.00 


Greenwood 


Bessemer 


AL 


375 


13.00 


75.00 


10.00 


52.00 


11.00 


13.00 


76.00 


Gulfview 


Hancock 


MS 


590 


95.00 


3.00 


1.00 


49.00 


0.00 


6.70 


80.00 


Eutaw 


Greene 


AL 


338 


0.57 


98.50 


0.00 


49.00 


0.00 


4.00 


86.00 


C. F. Hard 


Bessemer 


AL 


619 


0.20 


99.70 


0.00 


46.00 


0.00 


6.00 


90.00 


Daniel Webster 


Chicago 


IL 


671 


0.00 


100.00 


0.00 


44.00 


0.00 


5.00 


95.00 


Augustin Lara 


Chicago 


IL 


616 


0.50 


0.40 


98.60 


50.00 


54.00 


8.70 


95.70 


Edward E. Dunne 


Chicago 


IL 


531 


0.00 


100.00 


0.00 


48.00 


0.00 


3.80 


96.00 


Bunche 


Chicago 


IL 


643 


0.00 


100.00 


0.00 


68.00 


0.00 


9.70 


98.00 


Dewey Elem. 


Chicago 


IL 


616 


0.00 


98.00 


0.00 


65.00 


25.00 


6.30 


98.00 


M. E. Lewis 


Sparta 


GA 


631 


1.16 


97.67 


0.00 


50.60 


0.16 


17.30 


99.00 


Sigel Elem. 


St. Louis 


MO 


340 


8.10 


85.30 


1.80 


45.00 


18.00 


16.00 


100.00 



(Table 1 continues) 
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(Table 1 continued) 



South Delta 


South Delta 


MS 


769 


5.00 


94.00 


1.00 


53.00 


0.00 


4.00 


100.00 


Stanfield 


Stanfield 


AZ 


757 


19.40 


1.00 


67.70 


50.30 


50.00 


19.00 


100.00 


Control school means 






512 


25.41 


61.35 


10.59 


50.72 


9.23 


12.18 


78.59 



Note: * Denotes the schools selected in the first phase of the study. 
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Table 2 

Comparison of Characteristics for Success for All Treatment Schools (N= 20) and Control 
Schools (N= 18). 



Variable 


Condition 


N 


M 


SD 


95% Cl for 
Lower bound 


Difference 
Upper bound 


t 


PPVT 


Control 


18 


89.82 


9.45 


-8.30 


4.27 


0.65 




Treatment 


20 


91.84 


9.62 








Enrollment 


Control 


18 


512 


172 


-58.31 


195.90 


1.10 




Treatment 


20 


444 


209 








% Female 


Control 


18 


50.72 


6.38 


-2.03 


5.60 


0.91 




Treatment 


20 


48.95 


5.68 








% Minority 


Control 


18 


74.59 


36.96 


-18.83 


32.16 


0.53 




Treatment 


20 


67.92 


40.18 








% ESL 


Control 


18 


9.23 


17.17 


-9.30 


11.64 


0.23 




Treatment 


20 


8.06 


14.61 








% Special 


Control 


18 


10.75 


6.12 


-6.08 


3.22 


0.62 


Education 


Treatment 


20 


12.18 


7.81 








% Free 


Control 


18 


78.59 


25.80 


-9.00 


27.56 


1.02 


Lunch 


Treatment 


20 


69.24 


28.97 
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Table 3 



Multilevel Models Predicting Student and School-Level Literacy Outcomes for the Longitudinal Sample. 









Literacy Outcomes 






Word Attack 




Letter Identification 


Word Identification 


Passage Comprehension 


Fixed Effect 


Effect SE 


t 


Effect SE t 


Effect SE t 


Effect SE t 


School mean achievement 



Intercept 


-0.02 


0.04 


-0.53 


-0.02 


0.05 


-0.35 


-0.03 


0.03 


0.50 


-0.03 


0.04 


-0.67 


Mean PPVT pretest 


0.16** 


0.04 


4.07 


0.24** 


0.06 


4.28 


0.20** 


0.03 


6.62 


0.26** 


0.04 


7.09 


SFA assignment 


0.25** 


0.09 


2.85 


0.18* 


0.08 


2.23 


0.16+ 


0.09 


1.85 


0.12 


0.08 


1.42 


Grade 


























Intercept 


0.48** 


0.05 


10.36 


0.88** 


0.09 


9.36 


0.85** 


0.04 


19.01 


0.82** 


0.04 


22.67 


Random Effect 


Estimate 


J 2 


df 


Estimate 


J 2 


df 


Estimate 


J 2 


df 


Estimate 


J 2 


df 


School mean achievement 


0.06 


221.16 


35 


0.07 


442.71 


35 


0.06 


278.58 


35 


0.06 


284.54 


35 


Within-school variation 


0.84 






0.65 






0.72 






0.72 







Note: +p< .10; *p< .05; **/?< .01. 
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Table 4 



Multilevel Models Predicting Student and School-Level Literacy Outcomes for the Combined Longitudinal and In-mover Sample. 









Literacy Outcomes 






Word Attack 




Letter Identification 


Word Identification 


Passage Comprehension 


Fixed Effect 


Effect SE 


t 


Effect SE t 


Effect SE t 


Effect SE t 


School mean achievement 



Intercept 


-0.02 


0.06 


-0.48 


0.00 


0.05 


0.09 


-0.01 


0.04 


0.50 


-0.00 


0.04 


-0.01 


Mean PPVT pretest 


0.20** 


0.06 


3.29 


0.24** 


0.06 


4.25 


0.20** 


0.04 


5.38 


0.25** 


0.04 


6.79 


SFA assignment 


0.19 


0.12 


1.56 


0.18* 


0.09 


2.09 


0.15 


0.09 


1.55 


0.09 


0.09 


0.34 


Grade 


























Intercept 


0.44** 


0.04 


10.38 


0.87** 


0.08 


10.50 


0.81** 


0.04 


21.67 


0.79** 


0.03 


23.21 


Random Effect 


Estimate 


J 2 


df 


Estimate 


J 2 


df 


Estimate 


J 2 


df 


Estimate 


J 2 


df 


School mean achievement 


0.13 


508.69 


35 


0.08 


546.52 


35 


0.08 


451.40 


35 


0.07 


431.37 


35 


Within-school variation 


0.81 






0.66 






0.72 






0.72 







Note : * p < .05; ** p < .01. 
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Table 5 



Success for All Impact Estimates Expressed as Effect Sizes ( d) and Additional Months of 
Learning. 3 



Outcome 


Longitudinal Sample 


Combined Sample 


d 


Months of 
Learning 


d 


Months of 
Learning 


Word Attack 


0.25 


4.69 


0.19 


3.89 


Letter Identification 


0.18 


1.84 


0.18 


1.86 


Word Identification 


0.16 


1.69 


0.15 


2.22 


Passage Comprehension 


0.12 


1.32 


0.09 


1.03 



Note : 3 The additional months of learning was computed by dividing each of the 
coefficients for the Success for All treatment effect on the four outcomes by the 
respective coefficients for the grade-level intercepts shown in Tables 3 and 4. This one 
grade-level difference expressed by the intercept approximates the amount of growth that 
occurs across one school-year. By dividing the coefficient for the Success for All 
treatment effect by this grade-level coefficient, we calculated the percentage of one 
grade-level that the treatment effect represents. Second, we estimated the duration of one 
school year as 9 months. Multiplying by 9 the figure derived from the first step, we 
converted the treatment effect coefficient into an estimate of additional months of 
learning. 
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Appendix 

Major Elements of Success for All 

Success for All is a schoolwide program for students in grades pre-K to six which organizes 
resources to attempt to ensure that virtually every student will acquire adequate basic skills and 
build on this basis throughout the elementary grades, that no student will be allowed to “fall 
between the cracks.” The main elements of the program are as follows: 



A Schoolwide Curriculum. Success for All 
schools implement research-based reading, 
writing, and language arts programs in all 
grades, K-6. The reading program in grades 
K-l emphasizes language and comprehension 
skills, phonics, sound blending, and use of 
shared stories that students read to one 
another in pairs. The shared stories combine 
teacher-read material with phonetically 
regular student material to teach decoding and 
comprehension in the context of meaningful, 
engaging stories. 

In grades 2-6, students use novels or basals 
but not workbooks. This program emphasizes 
cooperative learning and partner reading 
activities, comprehension strategies such as 
summarization and clarification built around 
narrative and expository texts, writing, and 
direct instruction in reading comprehension 
skills. At all levels, students are required to 
read books of their own choice for twenty 
minutes at home each evening. Cooperative 
learning programs in writing/language arts are 
used in grades 1-6. 

Tutors. In grades 1-3, specially trained 
certified teachers and paraprofessionals work 
one-to-one with any students who are failing 
to keep up with their classmates in reading. 
Tutorial instruction is closely coordinated 
with regular classroom instruction. It takes 
place 20 minutes daily during times other than 
reading periods. 



Quarterly Assessments and Regrouping. 

Students in grades 1-6 are assessed every 
quarter to determine whether they are making 
adequate progress in reading. This information 
is used to regroup students for instruction across 
grade lines, so that each reading class contains 
students of different ages who are all reading at 
the same level. Assessment information is also 
used to suggest alternate teaching strategies in 
the regular classroom, changes in reading group 
placement, provision of tutoring services, or 
other means of meeting students’ needs. 

Solutions Team. A Solutions Team works in 
each school to help support families in ensuring 
the success of their children, focusing on parent 
education, parent involvement, attendance, and 
student behavior. This team is composed of 
existing or additional staff such as parent 
liaisons, social workers, counselors, and 
assistant principals. 

Facilitator. A program facilitator works with 
teachers as an on-site coach to help them 
implement the reading program, manages the 
quarterly assessments, assists the Solutions 
Team, makes sure that all staff are 
communicating with each other, and helps the 
staff as a whole make certain that every child is 
making adequate progress. 





